Automatic Acquisition of a High-Precision Translation Lexicon from Parallel Chinese-English Corpora

نویسنده

  • Zhao-Ming Gao
چکیده

This paper presents a hybrid approach to deriving a translation lexicon from unaligned parallel Chinese-English corpora. Two types of information, namely, proximity and document-external distributions of word pairs, are proposed to enhance the precision of the translation lexicon derived from statistical and dictionary-based methods. The former can identify translations of Chinese compounds, while the latter can filter out spurious word correspondences. Both methods achieve more than 94% precision.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Data-driven Amharic-English Bilingual Lexicon Acquisition

This paper describes a simple approach of statistical language modelling for bilingual lexicon acquisition from Amharic-English parallel corpora. The goal is to induce a seed translation lexicon from sentence-aligned corpora. The seed translation lexicon contains matches of Amharic lexemes to weekly inflected English words. Purely statistical measures of term distribution are used as the basis ...

متن کامل

Large - Scale Automatic Extraction of anEnglish - Chinese Translation

We report experimental results on automatic extraction of an English-Chinese translation lexicon, by statistical analysis of a large parallel corpus, using limited amounts of linguistic knowledge. To our knowledge, these are the rst empirical results of the kind between an Indo-Europeanand non-Indo-Europeanlanguage for any signiicantvocabulary and corpus size. The learned vocabulary size is abo...

متن کامل

A Statistical View on Bilingual Lexicon Extraction: From Parallel Corpora to Non-parallel Corpora

We present two problems for statistically extracting bilingual lexicon: (1) How can noisy parallel corpora be used? (2) How can non-parallel yet comparable corpora be used? We describe our own work and contribution in relaxing the constraint of using only clean parallel corpora. DKvec is a method for extracting bilingual lexicons, from noisy parallel corpora based on arrival distances of words ...

متن کامل

Learning an English-chinese Lexicon from a Parallel Corpus

We report experiments on automatic learning of an English-Chinese translation lexicon, through statistical training on a large parallel corpus. The learned vocabulary size is nontrivial at 6,517 English words averaging 2.33 Chinese translations per entry, with a manuallyfiltered precision of 95.1% and a single-most-probable precision of 91.2%. We then introduce a significance filtering method t...

متن کامل

Construction of a Chinese-english Verb Lexicon for Machine Translation

This paper addresses the problem of automatic acquisition of lexical knowledge for rapid construction of MT engines in multilingual applications. We describe new techniques for large-scale construction of a Chinese-English verb lexicon and we evaluate the coverage and eeectiveness of the resulting lexicon. Leveraging oo an existing Chinese conceptual database called HowNet and a large, semantic...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998